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A ROBUST METHOD FOR SHIFT DETECTION IN TIME SERIES 

HEROLD DEHLING, ROLAND FRIED, AND MARTIN WENDLER 


Abstract. We present a robust test for change-points in time series which is based on 
the two-sample Hodges-Lehmann estimator. We develop new limit theory for a class of 
statistics based on the two-sample U-quantile processes, in the case of short range dependent 
observations. Using this theory we can derive the asymptotic distribution of our test statistic 
under the null hypothesis. We study the finite sample properties of our test via a simulation 
study and compare the test with the classical CUSUM test and a test based on the Wilcoxon- 
Mann-Whitney statistic. 
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1. Introduction 

Statistical tests for the presence of changes in the structure of a time series are of great 
importance in a wide range of scientihc discussions, e.g. regarding economic, technological 
and climate data. Many procedures for detecting changes and for estimating change points 
have been proposed in the statistical literature, see e.g. Csorgo and Horvath (1997) for a 
detailed exposition. In the case of independent observations with normal tails, the theory 
is quite satisfactory. For a wide variety of change-point models, many statistical procedures 
have been proposed and their properties have been investigated. In contrast, the situation 
is quite different for dependent data, such as encountered in time series models, and for 
heavy-tailed data. For dependent data, most research has focused on linear procedures, such 
as CUSUM tests, and there are many open problems when it comes to other types of test 
procedures, e.g. those based on robust statistics. 

In the present paper, we study tests for detecting a level shift in a time series. Specifically, 
we assume that the sequence of observations (Xn)n>i is generated by the model 

Xfi J- Ui, 

Key words and phrases. Change-point tests, shift detection, Hodges-Lehmann estimator, time series, 
weakly dependent data, two-sample U-statistics, two-sample U-process, two-sample U-quantiles, functional 
central limit theorem. 
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where (/in)n>i is a sequence of unknown constants and (Yn)n>i is a stationary process with 
mean zero. We will focus on the case when (W)n>i is a weakly dependent process, in a sense 
that we will specify below. As examples, we will be able to treat most standard models of 
time series analysis, such as ARMA processes and GARCH processes, but our theory is not 
restricted to such concrete models. 

Given observations Xi,... ,X„, we want to test the null hypothesis that the process is 
stationary, i.e. 

H . fXi ... 

against the alternative that there is a level shift at some unknown point in time, i.e. 


A : there exists k E {1,..., n — 1} such that pi = ... = /ifc 7 ^ fik+i = ... = /in- 


Note that in case the change-point k is known in advance, the test problem becomes a 
standard two-sample problem, where the hrst sample is Xi,..., X^ and the second sample is 
Xfc_|_i,... ,Xn. This test problem is obviously much simpler than the change-point problem 
studied here. At the same time, tests for the two-sample problem often serve as guideline 
for hnding tests for the problem of detecting a change at an unknown point in time. 

The standard test statistic for the above change-point problem is the GUSUM statistic, 
which is dehned as 


max 


k 

2=1 


X,-- 

n 


n 

z 

2=1 


X; 


The asymptotic distribution of this test statistic under the null hypothesis can be derived 
from a functional central limit theorem for the partial sum process Xj)o<A<i- In the 

case when the noise process (Ti)i>i is i.i.d. with hnite variance, this is Donsker’s invariance 
principle. Similar results have been obtained for a wide range of short range dependent 
processes (Xi)i>i; see e.g. Ibragimov and Linnik (1961), Bradley (2007) and Dedecker et ah 
(2007) for various functional central limit theorems for a large class of weakly dependent 
processes. In this case, the partial sum process will converge in distribution to a Brownian 
motion (o'IT(A))o<a<i, where = Var(Xi) -|- 2 ^^2 Cov(Yi, W) is the long-run variance. 
As an application of the continuous mapping theorem, we thus hnd that 


1 

max —= 

k=l,...,n -y/n 




k 

n 





a sup |IT(A) - AIT(1)|. 

0<A<1 


Hence we can calculate approximate critical values for the GUSUM test from tables of the 
Kolmogorov-Smirnov distribution, i.e. the distribution of the supremum of the Brownian 
bridge process {W (A) — AIU (1))o<a<i, provided we have a consistent estimator of the asymp¬ 
totic variance Such estimators have been proposed in the literature; see e.g. Dehling, 
Fried, Sharipov, Vogel and Wornowizki (2013) for a recent result under dependence condi¬ 
tions relevant for this paper. 

The GUSUM test is based on partial sums and is thus not robust to outliers in the data. 
In this paper, we will propose a robust alternative to the GUSUM test and investigate its 
properties. Our test will be valid without any moment assumptions on the underlying data, 
and can thus be applied to arbitrarily heavy-tailed data. In order to motivate our test, we 
note that using some elementary algebra, we obtain the following alternative representation 
of the GUSUM test statistic. 




1 

n — k 


i=k-\-l / 
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On the right hand side we have the term | which is the stan¬ 

dard estimator for the difference of the expected values of the two samples Xi,, Xk and 
Xfc+i,..., Xn, in a Gaussian two sample model of i.i.d. data. 

In our paper, we propose a test that is based on the Hodges-Lehmann two sample estima¬ 
tor, in the same way that the CUSUM test is based on the difference of the arithmetic means 
of the two samples. In a classical two-sample problem with independent samples Xi,..., X^ 
and Yi,..., Yn2, the Hodges-Lehmann estimator is dehned as 

med{(lj — Xi) : 1 < i < ni, 1 < j < n 2 }', 

see Hodges and Lehmann (1963). The Hodges-Lehmann estimator is robust, while having a 
high efficiency in the case of Gaussian observations. Asymptotic normality of the Hodges- 
Lehmann estimator has been established under a wide range of assumptions. Recently, Fried 
and Dehling (2011) explored the good robustness properties of two-sample tests based on 
this estimator, and Dehling and Fried (2012) proved its asymptotic normality in the case of 
short range dependent observations. 

We propose the Hodges-Lehmann change-point test statistic, which we dehne as 

h / h\ 

Tn ■= y/n max — ( 1-) |med{(X,' — XA : 1 < i < k, k + 1 < j < n} \ . 

i<k<n n\nj 

The Hodges-Lehmann change-point test will reject the null hypothesis for large positive or 
large negative values of T„. In this paper, we will derive asymptotic distribution theory for a 
class of test statistics, of which the Hodges-Lehmann test statistic is a special example. As 
an application of our general results, we can determine the asymptotic distribution of the 
Hodges-Lehmann change-point test under very general conditions. 


Theorem 1.1. Let (l^)i>i be a stationary proeess that is a near epoch dependent functional 
of an absolutely regular process {Zn)nez with mixing coefficients {/3{n))n>i and approximating 
constants (a„)n>i satisfying /3{n) = 0{n~^) and Moreover, let Yi have an 

absolutely continuous distribution with density f{x) and assume thatu{x) = f f{y)f{x+y)dy 
is ^-Holder continuous. Then, under the null hypothesis of no change, we obtain 

y/n max —(1 — —) |med{(X,- — Xj) :l<i</i;, /i;-|-l<j< n}\ ^ sup |IF*^°^(A)| , 

l<A:<n n n m(0) 0<A<1 

where 

OO 

cr^= YI Cov(F(X„),F(X0), 

k=—oo 

and where (IF®(A))o<a<i denotes a standard Brownian bridge process. 


In order to apply the above theorem, we have to provide consistent estimators for the 
nuisance parameters and m( 0). We use a subsampling estimator for that was proposed 
by Dehling and Fried (2012). We choose a block length / = /„ in such a way that -^ + j = 
o(l). We then dehne the estimator 


a 


\/t 

\/‘2ln ("R/ ln\ 


[n/l\ 




where Si{l) = E5L(i-i)i+i {Fn{Xi) — A). Dehling, Fried, Sharipov, Vogel and Wornowizki 
(2013) have established consistency of this estimator under the same assumptions as made 
in the present paper. We use a kernel density estimator for m( 0). Observing that u{x) 
is the density of X — F, where X and Y are independent random variables with the same 
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distribution as Xi, we use a kernel density estimator based on the pairwise differences Xi—Xj, 
1 < * < j < R- In this way we get 


6 ( 0 ) 


2 

n{n — l)b 


E A- 

l<i<j<n 



for a symmetric, Lipschitz-continuous kernel function K which is integrates to 1. Below, we 
will show that 41(0) is a consistent estimator of m( 0) under H, provided that the bandwidth 
b = bn is chosen appropriately. 


Corollary 1.2. Under the same assumptions as in the above theorem, we obtain under the 
null hypothesis 

max — ( 1 — — ) |med{(Xj — XA 1 < i < k, k + 1 < j < n}\ 
a i<k<n n \ n J 

converges in distribution to suPq<;)^<;^ where denotes a standard Brownian 

bridge process. 


The asymptotic distribution of the Hodges-Lehmann test statistic can be derived from a 
study of the process i/nA(l — A) med{(Xj — Xj) : 1 < i < [nA], [nA] + 1 < j < n}, 0 < A < 1. 
More generally, we are lead to the study of the process of quantiles of the values 

g{Xi,Xj), 1 < i < [nA], [nA] + 1 < j < n, 

indexed by 0 < A < 1, where g{x,y) is a given function of two variables. In the present 
paper we investigate the asymptotic distribution of this process in the case of short range 
dependent data. 


2. Main Theoretical Results 

2.1. Near Epoch Dependent Processes. We will derive the asymptotic results in this 
paper under the assumption of short range dependence. In the literature, there is a wide 
range of notions that formally capture the idea of short range dependent processes. The 
classical approach is to impose mixing conditions, such as strong mixing, absolute regularity 
and uniform mixing, also known as o-mixing, /5-mixing and 0-mixing, respectively. This 
approach was initiated by the seminal paper of Rosenblatt (1956), where strongly mixing 
processes were introdnced and where a central limit theorem for partial sums of strongly 
mixing processes was proved. For a survey of various mixing concepts and associated limit 
theorems for partial sums, see the monographs by Donkhan (1994) and Rio (2000), as well 
as the encyclopedic three volnme monograph by Bradley (2007). 

Mixing concepts provided a nnifying strncture that allowed establishing limit theorems for 
a wide range of stochastic processes, that previously could be treated only by ad hoc methods. 
Among these processes are, e.g. ARMA-processes with a continnous innovation distribntion, 
stationary ergodic Markov Chains, the process of digits in a continned fraction expansion. At 
the same time there are very important processes that are not mixing. One of the simplest 
examples are AR(l)-processes with discrete innovations, as was pointed ont by Andrews 
(1984). The largest class of non-mixing processes are deterministic dynamical systems, i.e. 
processes dehned as Xn = T{Xn-i), where T : A —)■ A is a map on some state space X and 
where Xq is a random variable. Such processes do not satisfy any of the classical mixing 
conditions, bnt yet under some assnmptions on the map T and the distribntion of Xq, they 
satisfy many of the classical limit theorems. In order to overcome these obvious shortcomings 
of mixing concepts, Donkhan and Louhichi (1999) suggested various new notions of weakly 
dependent processes, which are based on covariance ineqnalities for snitable fnnctions of 
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blocks of random variables that are separated in time. In their paper and in snbseqnent 
pnblications of various authors, a large variety of limit theorems was established for processes 
satisfying these notions. For a comprehensive survey, see the monograph by Dedecker et al 
(2007). 

In the present paper, we follow a more classical approach that has been used already 
by Billingsley (1968) and Ibragimov and Linnik (1971). We assume that the noise process 
is near epoch dependent (NED) on an absolutely regular process. 

Definition 2.1. (i) Let A,B CtL be tujo CT Jtetds oti the pT^obab'il'ity space (D, e 

define the absolute regularity coefficient /3{A,B) by 

ffiA,B) = E{snp\P{A\B) - P{A)\) 

AeA 

(a) For a stationary process {Zn)nez we define the absolute regularity coefficients 

n>l 

where g^ denotes the a-field generated by the random variables Zk,...,Zi. The process 
{Zn)nez is called absolutely regular if ffik) ^ as k ^ oo. 

(Hi) Let ((X„, Zn))nez be a stationary process. We say that {Xn)n>o is L^-near epoch depen¬ 
dent on the process {Zn)nez with approximating constants if 

E\Xo-E{Xo\gLi)\<ai, 

and lim;^oo a; = 0. 

If the process (X„)„>o is near epoch dependent on the process {Zn)nez, we get by deh- 
nition the representation Xq = f{{Zn)nei) for some measurable function / : —)■ R. By 

stationarity, we thus obtain a representation 

Xk /'((•^n+fc)nsz) • 

Thus, a process that is NED on an absolutely regular process is also called a functional of 
an absolutely regular process. The class of processes that are NED on an absolutely regular 
process contains all relevant processes from time series analysis as well as many dynamical 
systems; see e.g. Borovkova, Burton and Dehling (2001) for a detailed list of examples. 


2.2. Two-sample empirical U-quantile process. In this paper, we will investigate the 
two-sample empirical quantile process associated with the kernel g{x,y). We will now for¬ 
mally dehne this process, as well as the related two-sample empirical U-process, both in a 
slightly more general setup of empirical processes indexed by classes of functions. 

Definition 2.2. Let h : R^ x R —)■ [0,1] be a measurable function, and let (Xj)j>i be a 
stochastic process. 

(i) We define the two-sample empirical U-process 


[nA] 


Un{X,t) = - rvuYl h{Xi,Xj,t), 0 < X <l,t eR. 

\nX\[n — \nX\) ^ 

' *=1 j=[nA]-|-l 

(a) Given p G [0,1], we define the two-sample empirical U-guantile process 

Qn{X,p) = inf{t : Un{X,t) >p}, 0 < A < 1. 
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Remark 2.3. (i) Given a kernel g{x,y), we can define 

Then, Un{X, •) is the empirical distribution function of the data g{Xi, Xj), I < i < [riA], [nA] + 
1 < j < R, and Qn(A) is the p-th quantile of the same data. 

(ii) For Fixed t, the process (f/n(A, f))o<A<i is a two-sample U-process that has been intro¬ 
duced and investigated by Dehling, Fried, Garcia and Wendler (2013). 

In this paper, we will study the asymptotic distribution of the two-sample empirical U- 
quantile process (Qn(A))o<A<i in the case of weakly dependent data (Xj)j>i. Before we can 
formulate the results, we have to make some further definitions. 

Definition 2.4. Let h{x,y,t) be a measurable function, and let X,Y be independent random 
variables with the same distribution as Xj. Then we define the functions U{t),hi{x,t), and 
h 2 {y,t) by 

(1) U{t) = Eh{X,Y,t) 

(2) h{x,t) = Eh{x,Y,t) -U{t) 

(3) hfiy,t) = Eh{X,y,t)-U{t). 

Moreover, we define the quantile function 

Q{p) = inf{t : U{t) >p}, 

and the p-th quantile tp = Q{p). 

Our theorems will require various technical assumptions regarding the process (Xj)j>i and 
the kernel h{x,y,t), which we list now. 

(Al) {Xn)n>i is a near epoch dependent functional of an absolutely regular process {Zn)nez 
with mixing coefficients (3{n)n>i and approximation constants (a„)„>i satisfying 

fi{n) = 0{n~^) 
an = 

for some constant (3 > 3. 

(A2) {U(t)), as defined in ([1]) is differentiable in a neighborhood of tp. Moreover, u(t) = 
U'it) satisfies u{tp) > 0, and 

\U{t) - p - u{tp){t - tp)\ = 0{\t-tp\^/^), 

as f —)■ tp. 

(A3) The kernel h : x R is a bounded measurable function. Moreover, t i—)■ h{x,y,t) 

is nondecreasing, and (x, y) i—)■ h{x, y, t) is uniformly p-Lipschitz continuous in a 
neighborhood of tp. he., there exists a neighborhood of tp and a constant L > 0 such 
that 

E{\h{X,Y,t)-h{r,Y,t)mx-x'\<e}) < Le 
E{\h{X,Y,t)-h{X,Y’,t)mY-Y'\<e}) < Le 

holds for all t in this neighborhood, for all e > 0, and for all quadruples X, Y, X', Y' 
of random variables such that (X, Y) has joint distribution Pxi x Pxi or Pxi,Xfc, for 
some k, and such that X' and Y' each have the same marginal distribution as Xj. 
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Theorem 2.5. Let (Xj)j>i be a near epoch dependent functional of an absolutely regular 
process such that assumption (Al) is satisfied. Moreover, let h : xM ^ M. be a measurable 

kernel such assumptions (A2) and (A3) hold. Then 

(A(l - A)(Q„(A,p) - e(p)))„<«i A ((1 - A)HA(A) + A(H',(1) - H'2(A)))„<,<i, 


where {Wi{\),W 2 {\)) is a two-dimensional Brownian motion with covariance structure 


Coy{W,{p),W,{\)) 


'' E E{h.{Xo; Q(p)), h,(Xt; Q(p))). 


Here u{t) = -^[/{t). 


An important ingredient in the proof of the limit theorem for the two-sample 17-quantile 
process is the Bahadur-Kiefer representation of the 17-quantiles, see Bahadur (1966). The 
Bahadur-representation for two-sample 17-quantiles (with hxed A) have been studied by 
Inagaki (1973) for independent data and by Dehling and Fried (2012) for dependent data. 
To the best of our knowledge, there are no results for the process indexed by A. There 
is much more literature on one-sample 17-quantiles, beginning with Geertsema (1970). In 
this case, better rates of the Bahadur representation are known, see Dehling et al. (1987), 
Choudhury and Serfling (1988) and Arcones (1996) for the independent case, Wendler (2011) 
for the dependent case. 


Theorem 2.6. Under the same assumptions as in Theorem 2.5, we obtain 

sup A(1 - A) (Qn(A,p) - Q{p) + = Op(n"i). 

o<A<i V u{Q{p)) ) 

Here u{t) = j^Uit). 


3. Simulation Results 

The practical value of the theoretical results presented above is illustrated in a simulation 
study. We generate time series of length n = 200 from hrst order autoregressive models with 
parameter 0 G {0, 0.4, 0.8} and the innovations stemming from scaled t-distributions with 
n G {2, 3, cxd} degrees of freedom. In case oi n = oo this is a standard normal distribution, 
while u = 3 and u = 2 correspond to the cases where the variance just exists or does not 
exist. All t-distributions are scaled to have F7(l) = 0.8413447, like for the standard normal, 
to ease comparison. 

For estimation of the long-run variance needed for our change-point test based on the 
Hodges-Lehmann estimator (HLE), we consider the fixed block length = [(3n)^/^ -|- 1)], 
corresponding to /„ = 9 when n = 200. This agrees well with the Endings of Dehling et al. 
(2013) for ARMA(1,1) processes. Additionally, we consider the adaptive block length 

(4) G = max([n^/^(20/(l - 0^))^/^], 1). 

Carlstein (1986) proved that this block length minimizes the MSE of the estimator for 
the long-run variance of the CUSUM test asymptotically in case an AR(l)-process with 
autoregression coefficient 0. Dehling et al. (2013) obtained good results also when apply¬ 
ing this adaptive block length for subsampling estimation of the long-run variance of their 
change-point test based on the Wilcoxon-Mann-Whitney (WMW) statistic, replacing 0 by 
the lag-one sample autocorrelation of the series Fn{xt),t = 1,... ,n. 

The sizes of the different tests under the different error distributions are assessed by 
applying the tests with the critical value 1.36, corresponding to an asymptotic 5% significance 
level, to 4000 time series without shift. Then we generate 400 time series for shifts with each 
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Table 1. Empirical sizes of the different tests for differently strong autocor¬ 
relations (0) and heaviness of the tails {u). 


0 

z/ 

hxed block length 
CUSUM WMW HLE 

adaptive block length 
CUSUM WMW HLE 

0.0 

oo 

3.9 

2.9 

3.6 

3.8 

2.2 

2.8 

0.0 

3 

3.5 

2.9 

3.7 

3.1 

2.4 

2.8 

0.0 

2 

3.4 

3.1 

4.8 

2.5 

2.2 

3.4 

0.4 

oo 

4.9 

3.1 

3.8 

6.0 

3.9 

4.3 

0.4 

3 

3.8 

3.0 

3.9 

4.9 

3.3 

4.0 

0.4 

2 

3.6 

3.0 

4.5 

4.2 

3.8 

5.1 

0.8 

oo 

10.6 

6.5 

7.1 

4.0 

2.5 

2.9 

0.8 

3 

10.5 

7.0 

7.7 

3.9 

2.8 

3.1 

0.8 

2 

8.8 

6.7 

10.7 

3.7 

2.3 

5.7 


of different heights h = 0.1, 0.2,..., 1 in case of 0 = 0, h = 0.2, 0.4,..., 2 under 0 = 0.4 and 
h = 0.4, 0.8,..., 4 under 0 = 0.8 to compare the power of the different tests. 

Figure [T] illustrates the power of the different tests in case of a shift of increasing height in 
time series with normal innovations. As expected, the CUSUM is usually the most powerful 
test under normality. In case of a shift in the center of the time series, the WMW and the 
HLE test are close competitors, with the latter providing good power also when the shift is 
not in the center. The adaptive block length for the subsampling increases the power in case 
of small or moderate positive autocorrelations, particularly for a shift outside the center, 
and it stabilizes the size of the tests in case of large positive autocorrelations. Note that the 
HLE test with adaptive block length performs even better than the corresponding CUSUM 
test in case of strong positive autocorrelations. Estimation of the asymptotic variance is 
less vulnerable to shifts in case of the HLE test than for the CUSUM, since we deal with 
autocovariances of random variables which are transformed to the bounded interval [ 0 , 1 ]. 

Figure [2] compares the power of the different tests in case of a shift of increasing height in 
time series with fs-distributed innovations. The asymptotic theory underlying the CUSUM 
still applies for this heavy-tailed distribution, since the variance exists. Nevertheless, our 
implementations of the CUSUM test provide smaller power than the WMW and HLE tests. 
Again we hnd little difference between the WMW and the HLE test if a shift occurs in the 
center of the series, and a substantial advantage of the HLE test if the shift occurs outside 
the center. The increased power in case of small positive autocorrelations and the better 
size preservation in case of large positive autocorrelations resulting from the adaptive block 
length is also conhrmed. We get similar results for the situation of f 2 -clistributed innovations, 
see Figure [21 Although the asymptotic theory underlying the CUSUM test does not apply, it 
still preserves the size if the autocorrelations are moderate, but the advantages of the WMW 
and even more the HLE test in terms of power increase as the tails get heavier. 

Table [1] indicates that the better power of the HLE test as compared to the WMW test 
is in part due to its size being closer to the nominal signihcance level. The WMW test is 
more conservative, particularly if the autocorrelation 0 is large or the degrees of freedom v 
are small. 

The tests employing Carlstein’s adaptive choice of the block length could be improved 
further by using a more sophisticated estimate of p than the sample autocorrelation coeffi¬ 
cient applied here. The latter is positively biased in the presence of a shift, which leads to 
undesirably large choices of the block length. This negative effect becomes more severe for 
larger values of p, since the plug-in-estimate of the asymptotically MSE-optimal choice of 
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Figure 1. Power of the tests in case of a shift in the center (left) or after 3/4 
(right) of AR(1) time series with 0 = 0 (top), 0 = 0.4 (middle) or 0 = 0.8 
(bottom) and normal innovations, length n = 200. CUSUM (dotted), WMW 
(dashed), HLE (solid) with hxed (black) or adaptive (grey) block length for 
the subsampling. 
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Figure 2. Power of the tests in case of a shift in the center (left) or after 
3/4 (right) of AR(1) time series with 0 = 0 (top), 0 = 0.4 (middle) or 0 = 
0.8 (bottom) and innovations, length n = 200. CUSUM (dotted), WMW 
(dashed), HLE (solid) with hxed (black) or adaptive (grey) block length for 
the subsampling. 
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Figure 3. Power of the tests in case of a shift in the center (left) or after 3/4 
of AR(1) time series with 0 = 0 (top), 0 = 0.4 (middle) or 0 = 0.8 (bottom) 
and f 2 -clistributed innovations, length n = 200. CUSUM (dotted), WMW 
(dashed), HLE (solid) with hxed (black) or adaptive (grey) block length for 
the subsampling. 
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In increases more rapidly if 0 is close to 1, while it is rather stable for moderate and small 
values of (j). In our study, in case of 0 = 0 the average value chosen for In increases from 
about 2 to about 4 on average as the height of the shift increases, while it is from about 6 to 
about 10 if 0 = 0.4, and even from about 16 to about 24 if 0 = 0.8. A robust estimate of the 
autocorrelation coefficient resisting shifts could be used, but this is left for future research. 

We also considered direct estimation of the asymptotic variance after correcting the data 
for a possible shift as proposed by Huskova and Kirch (2010), but our implementation pro¬ 
vided substantially oversized tests. When correcting the sizes of the tests by using the 
empirical 95% percentile of the absolute values of the test statistics, derived from time series 
without a shift generated from the same model, the differences between the tests were less 
pronounced than those presented here. However, such a comparison is not realistic, since in 
practice we usually know neither the time series model nor the type of innovations and can 
thus not use such critical values derived from simulations. Instead, bootstrap procedures 
might be an interesting alternative, but this will not be pursued here. 

4. Data analysis 

For illustration of the gains in power arising from the HLE test as compared to the CUSUM 
and the WMW test we analyze the monthly averages of the daily minimum temperatures 
in Potsdam, Germany, from 1893 to 2010. The 1416 data points have been deseasonalized 
by subtracting the median value from each calendar month, see Figure 01 We are interested 
in whether the level of this data set is constant or whether there is a monotonic change. 
Such a change is likely to show a trend-like behavior and not a jump, but nevertheless a 
change-point test should detect such a change if its null hypothesis is a constant level. 

The empirical autocorrelation and partial autocorrelations suggest a hrst order autore¬ 
gressive model with lag-one autocorrelation about 0.25 for the deseasonalized data. The 
CUSUM and the HLE test statistics take their maximum value in 1987 after time point 
1136, while the WMW test takes it in 1943 after time point 595, i.e. rather in the middle 
of the time series. The resulting p-values are 0.002, 0.002 and below 0.001 for the CUSUM, 
the WMW and the HLE test with the hxed block length. All p-values are even below 0.001 
when using the adaptive block length. 

When dividing the data into the times periods before and after the detected change-point, 
the HLE test with adaptive block length yields a further signihcant p-value of 0.042 after time 
point 380 in 1924. For the HLE with hxed block length and the WMW with adaptive block 
length the p-value is 0.062, while the CUSUM test is far from being signihcant and does not 
reject the null hypothesis of constant levels within the subsequences. The multiple change- 
points detected by the HLE test with adaptive block length might be due to a monotonic 
trend and can be explained by the superior power of this test in case of heavy tailed data 
like daily minima. The HLE estimates an increase of the temperatures by 0.355"C from the 
hrst to the second and by another 0.765‘’C from the second to the third period. 

5. Proofs 

5.1. Auxiliary Results. The proofs require some further notations, which we introduce 
now. Given the kernel h{x,y,t), we dehne the two-sample empirical U-process 

^ ni ni+n 2 

■= - V V h{Xi,Xj,t), 

nin2^ . 

2=1 j=ni+l 

and the two-sample empirical U-quantile process 

Qnun2ip) = Un,]n2iP) = inf{t : U„i,n2(^) > P} ■ 
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Figure 4. Deseasonalized time series representing the monthly average daily 
minimum temperatures in Potsdam, Germany, and change-points detected by 
the HLE test with adaptive block length. 



1893 1928 1960 1993 


ti 


me 


Note that F„(A,t) = U[nxin-inX\(t) and Qn(A,p) = Q[nX\,n-in\]{p)- Moreover, we dehne 

9 {x,y,t) = h{x,y,t) - hi{x,t) - h 2 {y,t) -U{t), 

where hi{x,t), h 2 {y,t), and U{t) have been dehned in ([1]), ([2]) and ([3]), respectively. Thus, 
we obtain the Hoeffding decomposition of the two-sample [/-statistic as 


^ ni ^ ni+n2 ^ ni ni+n2 

Uni,n2{'t) ■= u{t) -\— h2{Xj,t)g{Xi,Xj,t). 

ni^ n2 . nin2 ^ 

2=1 j=ni+l 2=1 j=n\-\-l 

The next two lemmas will deal with the last sum, which is called degenerate part: 


Lemma 5.1. Under the assumptions (Al) and (AS), there exists a constant C, such that 
for any integers 0 < mi < ui < m 2 < 112 

( ni n2 \ ^ 

g{Xi,Xj,t)\ < G(ni-mi)(n2-m2), 

i=mi + lj=m2 + l / 

for all t in the neighborhood refered to in assumption (AS). 

Proof. For the special case m 2 = ni, this is Proposition 6.2 of Dehling and Fried (2012). The 
more general case can be proved with the same arguments, we omit the details. □ 


Lemma 5.2. Suppose that the assumptions (Al) and (AS) hold, 
(i) There is a constant C, such that 


E max 

\ 0<mi<ni<m2<222<2^ 


ni 712 

a{XuX„t) 

j'=m 2 +l 


2 


< C'22'/h 


(a) As n ^ 00 , we have 


sup 

0<A<1 


[AnJ n 

Y, Y 3 (Xi,X„t) 

i=LAnJ+l 


o(n log^ n). 


almost surely. 
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Proof. To prove the first part of the lemma, we introduce the notation 

n\ n 2 

Qmi,ni,m2,n2 E E 

i=m\-\-l j=m 2 -\-l 

for mi <ni <m 2 < n 2 , and Qmi,rii,m 2 ,n 2 = 0 otherwise. These quantities satisfy an addition 
rule 

Qm.i,ni,m2,?22 T Qni,n'^,m2,n2 Qmi,n'^,m2,n2 

Q 7711,711,1712,712 T Q TYli ,711,712 jllL Q m\ ,711,7712 lUL 


\Q 


7711,711,7712,712 


<2 max IQmi,ni,m2,2*l < 4 max |eo,ni.m 2 , 2 * I 

0 <ni<m 2 < 2 * 


Note that 

max _ _ 

0 <mi<ni<m 2 <n 2 < 2 * 0 <mi<ni<m 2 < 2 * 

Now we use a chaining technique. For example, 

|Qo, 5,7,16| E |eo,4,7,8| + IQo.l.S.iel + |e4,5,7,8| + |Q4,5,8,16|- 

We conclude that 

i i 

max \Qmi,ni,m2,n2\ E4 N N , \Q ((i-l)2'^l ,(j-l)2‘^2 j2‘^2)\- 

0 <mi<ni<m 2 <n 2 < 2 ' ^ ^ u / 

(ll =0 (l 2 = 0 ^._^ 2^-42 

Note that for any random variables Ei,..., we have that E (maxj=i_..._fc E^.)^ < Yli=i 
Using this inequality and Lemma 15.11 we conclude that 

2 


E 


( 


max |Qmi,ni,m2,722 

y 0 <mi<ni<m 2 <n 2 < 2 ^ 




(ii =0 <^ 2=0 


I I 


/=1,...,2*-42 


< 16/^ 'EE^ .. |Q((j_l)2dl,j24l,(j-l)242,j242) I 

0 d2 0 \ j—i 2^~'^2 

I I 2^-^i 2^-^2 

< 16/^ X] X] (e(p-l)24l,i24l,0-1)242 J242)) 

dl =0^2=0 i=l j=l 
I I 2*-4i 2*-42 

< cP EEEE 2^12^2 < cP2‘^^. 

di=0d2=0 i=l j=l 

So the first part of the lemma is proved. For the second part, it suffices to show that 

^aX \Q7711,711,7712,7121 o(2, I ). 

0<mi <ni<m2<n2<2^ 

Now by the Chebyshev inequality, we obtain 


Wp (— 

I 9b3 


1=1 


max 

2 ^/'^ 0 <mi<ni<m 2 <?L 2 ^ 2 ^ 


\Q 7711,711,7712,7121 — ^ 


^ e( 

- .2 02116^ [ „ 


max 


2‘^H^ \0<mi<n,<m2<n2<2‘ 

The Borel-Cantelli lemma completes the proof. 


1 C 


<; \ _ 

mi,ni,m2,n2\ I — « 


1=1 


< OO. 


□ 
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In order to prove Theorem 12.61 we need some information about the local behaviour of the 
empirical t/-process. We will hrst concentrate on the hrst half of the process, i.e. A G [ 0 , 1 / 2 ]: 

Lemma 5.3. Under the assumptions (Al), (A2), and (A3), 


sup 

Ae[0,l/2] 


A(1 - A) \ iUlXn\,n-lXn\it) - U (t)) - (f^LAnJ ,n-[AnJ (^p) - p) \ = O (^U 


\t — t \<C iog(min{A,l-.^ 

I P I — y min{ A.l —A}n 


almost surely. 

Proof. We dehne ni = [nAJ and n 2 = n — [nAJ, and note that ni + n 2 = n. We dehne the 
sequences C 21 = 2“ 9 / and for n = 2^“^ + 1,..., 2* we set c„ = 02 *. By the monotonicity of 
U in t, we have that 


sup 

ni<f 


nin2 




{Um,n2{t) -U{t)) - (L'ni,n 2 (^p) “ P) 




< max 

ni< 

t£c- 


nin2 


ni<f 




- U(t)) - {Un,,n2{tp) - P) 


\t—tp\<C 


log log 7 


+ 

\t—tp\<C 


max 

tec, 

log log 
rai 


\U{t)-U{t + Cn)\. 


As U is differentiable in tp, we get that the second summand is of the order 0(c„). For the 
hrst summand, we use the Hoeffding decomposition and get 


(5) max 

tec, 


nin2 


ni<f 




max 

ni<f 

t&Cn 


\t-tp\<c,/U^ 

< 

\t-tp\<c,/^^^ 

+ max 

ni<f 

teCnZ 

\t-tp\<c,/^ 

+ max 

ni<f 

t£C nZ 

\t-tp\<C,/^ 


(F'ni.rxaW - U(t)) - (f/ni,n 2 (^p) - P) 


- ^ hi{Xi, t) -^ hi{Xi, tp 

2 = 1 2=1 


n 


^ n 1 

— ^ h2{Xj,t) — — ^ h2{Xj,tp) 

j=ni+l j=ni+l 


ni ni+n2 


+max — ^ ^ g{Xi,Xj,tp) 


2=1 ji=r2l + l 


ni ni+n2 


2=1 j=niH-l 


For the hrst summand, we refer to (13) in Theorem 1 of Wendler [3l] and conclude that it 

S+T 315 

is of size o(n““(logn) 4 (loglog 11 ) 2 ) = 0(n“9) almost surely for a 7 > 0. Note that the 
continuity condition on the kernel in [3l] is different, but the continuity is only needed to 
guarantee that {hi{Xi))i^^ is near epoch dependent. This also holds under our continuity 
condition by Proposition 2.11 from Borovkova et al. [5]. 
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We split the second summand into two parts, so that for the hrst part n\jn is small and 
for the second part 


max 

ni<f 

t&CnZ 


n 


^71 1 ” 

— h2{Xj,t) — — h2{Xj,tp) 

j=ni+l j=ni+l 


< max 

4 

ni<nn 

tGCn^ 

+ 


h2{Xj,t) h2{Xj,tp) 

\j=ni+l j=ni+l 


max 

rxi<f 

tGCnIj 


^ n 1 ^ 

— h2{Xj,t) — — h2{Xj,tp 

j=ni+l j=ni+l 


=: 4.1 + 4.2. 


4/9 


As h is bounded and therefore h 2 is bounded, we have that 4i = 0(n 9). Along the lines 
of the proof of Theorem 1 in Wendler [3l], we obtain 

2 l+'y 13 1 5 

42 = ?7,“2(logn)4(loglogn)2) = 0(n“9) 

almost surely. For the third summand on the r.h.s. of ([5]), we use the hrst part of Lemma 
15.2[ the Chebyshev inequality, and the fact that the second moment of the maximum of 
random variables is smaller or equal to the sum of second moments. We obtain 

I ni 711+712 


EF 

1=1 


max 
C 2 I 2*-l<n<2* 


max 

ni<f 

tGCnZ 


n" 


Y, E a(x„Xj,t) 


2=1 j=ni-\-l 


> e 




00 

sE 


_ 2 ^2 

1=1 '^2' 




E I max 
2*-l<n<2' 


max 

711<f 

t&CnZ. 


ni 711+712 

X; E 9(v.A,i) 

2=1 ^=221 + 1 


00 

sE E 


-E 


( 


Til 711+712 


max 


1=1 tGc^iZ 2* 
\t-tp\<C 

00 


\ o<7ii<7ii+7i2<2* 


^ g(X„Xj,t) 

2 = 1 ji=ni + l 


1 ^ /4 


i=i 2* 


as the set {t G C2iZ, \t — tp\ < C} has at most C c~i^ elements. Using the Borel-Cantelli 
lemma, we conclude that the third summand on the r.h.s. of ([5]) is of size o(c„) almost surely. 
The last summand can be treated in the same way and so in total we have proved the order 
0(?T.“9) almost surely. □ 

Lemma 5.4. Enter the assumptions (Al) and (AS), 


sup -p| = O W 

712 >711 \ V 


log log ni 

ni 
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almost surely. 

Proof. We use the Hoeffding decomposition 

I ni 


sup |t/ni (tp) -P\< 

n2>ni 


+ sup — 

n2>ni 


1 


ni ^ ^ 
2=1 


+ 


ni 


ni+n2 


y ] ^2 {Xj , 

i=i 


^h2{Xi,tp) 

1 


n2 ^ , 
2=1 


+ sup 

n2>ni ^1^2 


ni n2 


i=l j=ni-\-l 


For the first two summands, we use Proposition 3.7 of Wendler [3l], which leads to 

ni 


^ ^ hk^Xi^ tp) 
rii ^ 

2=1 

for k = 1,2 almost surely. Furthermore 

ni+n2 

h2{Xj,t 

i=i 


= 0 


log log rii 

rii 


sup — 

n2>n\ ri2 


< sup -^- 

n2>ni U-i -|- 77/2 


711+712 

y ^ ^2 {Xj, tp 

j=i 


= o 


For the last summand, we use Lemma 15.21 to obtain 


log log rii 

ni 
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E I max IQ 

0 <mi<ni< 2 'l 

<m2 <2^2 


mi,Til,m2,n2 I 


< Cl+^+P 


Now by the Chebyshev inequality, we obtain 
^ 1 


oo I2 

l2 = l ll=l 


max 


2^1 log li 2^2 0<mi<ni<2*l 

ni<m2<n2<2*2 


|Qmi,ni,m2,n2 I ^ ^ 


00 I 2 ^ 

— ^2 yi yi 2^1 log /i2^^ 

l2 = lli = l ^ 


00 12 


E I max |Qmi,ni,r; 

2 ' 0 <mi<ni< 2 ^i 

ni<m2<n2<2^2 




< cWW 


< 00, 


□ 


so we can conclude that the last summand is of the required order almost surely. 
Proposition 5.5. Under the assumptions (Al), (A2) and (AS), the process 

\/u (A(1 — A)(t/[AnJ,n-[AnJ (Q) ~^*)Ae[0,l] 

converges weakly to 

((1 - A)Wi(A) + A(W2(1) - fP2(A))),,[o,,], 

where W = (fFi,lF2) is a two-dimensional Brownian motion with covariance structure 

Cov(Wi(/i),hF,(A)) = + A X)J2E{h,{Xo;Q{p)),+{X,;Q{p))). 

kez 

This is Theorem 2.4 of Dehling et al. (2013). 

Lemma 5.6. Under the assumptions (Al), (A2) and (AS) for any bandwidth b = bn with 

m = A E K(AzJk) ^u(o) 


n{n — 1)6 


l<2<j'<n 


in probability. 
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Proof. First note that h is a one-sample ?7-statistic with symmetric kernel kn{x, y) = 
depending on n. We use the Hoeffding decomposition 


Un = Ekn{X,Y), 
kl,n{,x) = Ekn{x, X) - Un, 

k 2 ,n{x, y) = kn{x, y) - ki^n{x) - ki^n{y) - Un, 
where X, Y are independent with the same distribution as Xi. We obtain 


2 ^ \ 

11 ( 0 ) = Un-\ - ki n{Xi) -|- 

n ^^ 


^ k2AX^,Y,) 


i=l 


nin — 1) 

By our assumptions, K has a bounded support, so let K{x) =0 for |a:| > M. Because the 
density u is continues and K integrates to 1, we can conclude that 


Ur 


/I X 

m(0) = / -K{—)u{x)dx — u{0) < sup \u{x) 


u 


r\ 

0 , 


\x\<Mb 


since bn —)• 0. As iF is Lipschitz continous, i.e. \K{x) — K{y)\ < Li\x — y\, for some constant 
Li, we have that ki^n(x) is Lipschitz continous with constant Li/h'^. By Proposition 2.11 of 
Borovkova et al. [5], it follows that is near epoch dependent with approximation 

constants a'^ = Let C\ = Cjh be the upper bound of then by Lemma 

2.18 of Borovkova et al. 

\E + 2ClP{\i - j|/3) < + /3(|z - j|/3)), 

so we obtain by stationarity that 



\ 2 


A 

< -Y\E{hAXi)kiAXi))\ 

n 

i=\ 

^ 00 

- ~ il/3)) ^ 0, 

i=l 


because nY 00 , so the second summand converges to 0. For the third summand, we use 
Lemma 4.3 of Borovkova et al. and the fact that k2A^i u) is u degenerate kernel bounded by 
4:Ci/b and that the product /c 2 ,n(a^i, a; 2 )fc 2 ,n(a^ 3 , is P-Lipschitz with constant 4(4^|j) = 
Cb~^. We get the inequality 

|p (^2,n(Xj^, Xi2)fc2,n(Xi3, < — (A^/S +/3(m/3)) + C — Am/^ 

with Ai = ^2 and m = max {i( 2 ) -i(i),i( 4 ) -*( 3 )}, where qp < i( 2 ) < i( 3 ) < qp 

are the order statistics of the indices 11 , 12 , 1 ^, 14 - Thus, we obtain 


2 2 T 

e{- -^ V k2AX^,YS) <C- Y \E{k2AXn,XAk2AX^„XA)\ 

Vn(n —1) 

^ ' l < i < j<n 21,22,23,24=1 

1 "" 

= Y |P(fc2,n(X,3,Wjfc2,n(X,3, WJ)| 

m=0 U,^2,^3,i4 

max{i(2) -i(i) ,i(4) -i(3) }=m 



^^■=0 ^1,^2,^3,^4 

max{i(2) -y 1) ,i(4) “43)}=^ 
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A this point, we have to calculate the number of quadrubles (A, ^2, *3, h) such that max{z(2) — 
qi),f(4) — ^(3)} = m. First note that there are at most 6 quadrubles wich lead to the same 
ordered numbers qi),^2),qs),*(4). There are at most possibilities to choose qi) and Z(4). 
If Z(2) — f(i) = max{q2) — qi),f(4) — qs)} = m, then i{2) is already fixed and there are k 
possibilities i(3). The same argument applies if i(4) — i(3) = max{i(2) — i(i),i(4) — i(3)} = m, 
so we dually obtain 


_f_ y 

nin — 1) 

^ l<i<j<n 


2 1 ” 


m=0 


as the mAi^ and are summable by assumption (Al), and n'^b^ —)■ cxo. □ 

Lemma 5.7. Let G be a non-decreasing function, c,l > 0 constants and [C'i,C 2 ] C M. If for 
all t, t' G [Cl, C 2 ] with \t — t'\ < / + 2c 

\G{t)-G{t')-{t-t')\<c, 

then for all p,p' eM. with \p — p'\ < I and G~^{p), G~^{p') G (Ci + 2c + /, C 2 — 2c — 1) 

\G-\p) - G-\p') - {p - p')\ < c 
where G~^{p) := inf {f|G(f) > p} is the generalized inverse. 

Proof. This is Lemma 3.5 of Wendler [35]. □ 


5.2. Proof of the Main Theorems. 


Proof of Theorem \2.(A Without loss of generality, we can assume that u{tp) = 1, otherwise 
replacing h{x, y, t) by h{x, y, We will first concentrate on the hrst half, that means we 

will investigate 


Ae[o,^ 


sup A(1 - A) cy (p) -tp + CLAnJ,n-LAnJ (tp) " P 


AelOjo] 


— '^) ^[AnJ,n-[AnJ (^’) ^[AnJ,n-[AnJ (^p)) + [AnJ (^p) P 


+ sup A(1 A) C|^AnJ,n-[AnJ (^A^J.^-A^J (^p)) ^P 

Ae[o,i] 


By Lemma [ 5.41 we can choose Ci > 0, such that for all n 


P 


( sup 
Ae[0,5] 


I ,n—[AnJ (tp) p| / 


log log (An) 


An 



< e. 
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Hence, using Lemma l5^ and ISTTl there exists a constant C 2 such that 

SUp^ A(1 — A) (p) — Uyy^^^^^_yy^^^{Uyxn\,n-\Xn\{tp)) + Uy\ri\,n-\\n\{tp) — p\ > <^2^ ^ 


Ae[o, j] 

< FI 


sup A(1 - A) (p) - t^LAnJ,n-LAnJ (P') + P' " P 

Ag[0,^] 


> C2n 


-5/9 


/ 


, Pi Dr D ^ I / ^og\og{Xn) 

+ F sup |FLAnJ,n-LAnjHp) -p| /\/ -^ - > W 


\ Ae[o,|] 


An 


< F sup A(1 - A) \UlXni,n-lXni{t) - U(t) - ^LAnJ,n-[AnJ fe) + p| > C^'nT^I'^ + 

V / 

< 2e 

Thus, the hrst summand is of order . It remains to show the convergence of the 
second summand f^jA^j n-[Anj (^LA«J.?i-LA?iJ (^p)) “ ^p- By the dehnition of the generalized 
inverse, tA“ij,n-LAnj(^LAnj,n-LAnjfe)) -tp<0- Furthermore, if Uni,n 2 {t) < by the 

monotonicity of h, we have for all n'^ > na that As U-^,^^{Un^^n 2 {tp)) 

is the supremum of all t such that < Uni,n 2 itp), h follows that U~^„^^{Uni,n 2 itp)) is 

nondecreasing in n 2 - 

For every c> 0 it holds that < -c only if F„i,ni(^p-c)-F„i,„i(fp) > 

0 , which is equivalent to 

Uni,ni(j'p c) U(tp c) T P F lj(tp c) T P- 

By Lemma [5.3L there a constant C 3 such that 


F sup sup_ \Uni,ni (t) -U{t) - (Fni,ni (tp) “ p) | > C 3 < 6 . 

_ 5 _5 

As U is differentiable, we have that U{tp — ®) + p > ® for some constant C 4 and 

consequently for all n 2 > rii 

Un,]n2iUm,n2{tp)) - tp > . 

Finally we have that A(1 — A)[AnJ“t < n“i, and so we arrive at 


Fl sup A(1 A) f^LAnJ,n-[AnJ (^LAnJ,n-LAnJ (fp)) tp 

Ae[0,i] 




< F I sup Pnln^iUn^^nAtp)) “ ^pl > Q 


n\ <n/2 


< FI sup 

niSN 


UnunAtp - -U{{tp - 6*471^^'') - (F„i,ni(tp) “ P) 


-5/9\ 


>^3 


< F| sup sup _ \Uni,ni{t) - U (t) - {Uni,nAtp) “ P) I > C's ) < £• 
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So we have shown the convergence in probability for A restricted to [0, |]. For the second 
half (A G [1/2,1]), note that 


snp A(1 - A) ?7LAnJ,n-LAnJ (P) “ .n-[AnJ (tp) " P 

Ael^T] 


— snp A(1 A) ^LAnJ,n-[AnJ ,n-[AnJ (^p) P 


Ag[0,-2 


where Uni,n 2 is fhe two sample t/-statistics with kernel h{x,y,t) = h{y,x,t) calculated for 
the stochastic process (Xj)jgz with Xi = Xn-i- Because of the stationarity, the probability 
distribution of this does not change if we insert the random variables X[ = X_i instead. The 
process (X_j)jgz inherits the near epoch properties of (Xj)jgz- And with the same arguments 
as above, it follows that 


sup A(1 - A) f^LAnJ.n-LAnJ (P) “ + f^LAnJ ,n-[AnJ (^p) " P 

Ae[o,i] 


= Or 


(n 


-5/9) 


□ 


Proof of Theorem 1^.51 We decompose the stochastic process into two parts: 

^ (a(1 - - tp) 


Ae[o,i] 


\/n ^A(l iifl )^^ ^[AnJ,n—[AnJ (^p)) 


+ \/^ ( A(1 - A) (^f^LAnJ,n-LAnJ (P) “ ^P + 


Ae[0,i] 
fA[AnJ,n—[AnJ (i-p) P 
u{tp) 


Ae[o,i] 


By Theorem 12.61 the second part converges to zero in supremums norm. As a consequence 
of Proposition 15.51 the hrst part converges weakly to 

((1 - A)Wi(A) + A(W2(1) - fP2(A))),,[o,,], 

where W = (IFi, W 2 ) is a two-dimensional Brownian motion with covariance structure 

1 


Cov(Wi(/i),W,(A)) = (/iAA) 


u^iQip)) 


Y,E{K{Xo-,Q{p))MXpQ{p)))- 


kez 


□ 


Proof of Theorem \1.1[ By Theorem 12.51 

A(1 — A) median |Xj — Xj\l < i < [nAJ, [nAJ -l- 1 < j < 1 ) 

converges to 

((1 - A)M/,(A) + A(M/2(1) - ir,(A))),,|„,„ , 

where W = (hFi, —Wi) and Wi is a Brownian motion, as hi{x, 0) = —h 2 {x, 0). The variance 
isVar(Wi(l)) = ^. Now 

^ ((1 - A)hFi(A) + A(-Wi(l) + hFi(A))) = ^Wi(A) - A^hFi(l) 
a a a 

is a Brownian Bridge. Finally, by Lemma 15.61 and Theorem 1.2 of Dehling et ah [T5] . 

u probability, which completes the proof. □ 
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